{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {
        "cellView": "form",
        "id": "C1rAsD2L-hSO"
      },
      "outputs": [],
      "source": [
        "# @title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
        "\n",
        "# Licensed to the Apache Software Foundation (ASF) under one\n",
        "# or more contributor license agreements. See the NOTICE file\n",
        "# distributed with this work for additional information\n",
        "# regarding copyright ownership. The ASF licenses this file\n",
        "# to you under the Apache License, Version 2.0 (the\n",
        "# \"License\"); you may not use this file except in compliance\n",
        "# with the License. You may obtain a copy of the License at\n",
        "#\n",
        "#   http://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing,\n",
        "# software distributed under the License is distributed on an\n",
        "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
        "# KIND, either express or implied. See the License for the\n",
        "# specific language governing permissions and limitations\n",
        "# under the License"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "b6f8f3af-744e-4eaa-8a30-6d03e8e4d21e"
      },
      "source": [
        "# Bring your own ML model to Beam RunInference\n",
        "\n",
        "<table align=\"left\">\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://colab.research.google.com/github/apache/beam/blob/master/examples/notebooks/beam-ml/run_custom_inference.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/colab_32px.png\" />Run in Google Colab</a>\n",
        "  </td>\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://github.com/apache/beam/blob/master/examples/notebooks/beam-ml/run_custom_inference.ipynb\"><img src=\"https://raw.githubusercontent.com/google/or-tools/main/tools/github_32px.png\" />View source on GitHub</a>\n",
        "  </td>\n",
        "</table>\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "A8xNRyZMW1yK"
      },
      "source": [
        "This notebook demonstrates how to run inference on your custom framework using the\n",
        "[ModelHandler](https://beam.apache.org/releases/pydoc/current/apache_beam.ml.inference.base.html#apache_beam.ml.inference.base.ModelHandler) class.\n",
        "\n",
        "Named-entity recognition (NER) is one of the most common tasks for natural language processing (NLP). \n",
        "NLP locates named entities in unstructured text and classifies the entities using pre-defined labels, such as person name, organization, date, and so on.\n",
        "\n",
        "This example illustrates how to use the popular `spaCy` package to load a machine learning (ML) model and perform inference in an Apache Beam pipeline using the RunInference `PTransform`.\n",
        "For more information about the RunInference API, see [About Beam ML](https://beam.apache.org/documentation/ml/about-ml) in the Apache Beam documentation."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "299af9bb-b2fc-405c-96e7-ee0a6ae24bdd"
      },
      "source": [
        "## Install package dependencies\n",
        "\n",
        "The RunInference library is available in Apache Beam versions 2.40 and later.\n",
        "\n",
        "For this example, you need to install `spaCy` and `pandas`. A small NER model, `en_core_web_sm`, is also installed, but you can use any valid `spaCy` model."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "7f841596-f217-46d2-b64e-1952db4de4cb",
        "outputId": "da04ccb9-0801-47f6-ec9e-e87f0ca4569f"
      },
      "outputs": [],
      "source": [
        "# Uncomment the following lines to install the required packages.\n",
        "# %pip install spacy pandas\n",
        "# %pip install \"apache-beam[gcp, dataframe, interactive]\"\n",
        "# !python -m spacy download en_core_web_sm"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "7f841596-f217-46d2-b64e-1952db4de4cc"
      },
      "source": [
        "## Learn about `spaCy`\n",
        "\n",
        "To learn more about `spaCy`, create a `spaCy` language object in memory using `spaCy`'s trained models.\n",
        "You can install these models as Python packages.\n",
        "For more information, see spaCy's [Models and Languages](https://spacy.io/usage/models) documentation."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "metadata": {
        "id": "7f841596-f217-46d2-b64e-1952db4de4cd"
      },
      "outputs": [],
      "source": [
        "import spacy\n",
        "\n",
        "nlp = spacy.load(\"en_core_web_sm\")\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 4,
      "metadata": {
        "id": "7f841596-f217-46d2-b64e-1952db4de4ce"
      },
      "outputs": [],
      "source": [
        "# Add text strings.\n",
        "text_strings = [\n",
        "    \"The New York Times is an American daily newspaper based in New York City with a worldwide readership.\",\n",
        "    \"It was founded in 1851 by Henry Jarvis Raymond and George Jones, and was initially published by Raymond, Jones & Company.\"\n",
        "]\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 5,
      "metadata": {
        "id": "7f841596-f217-46d2-b64e-1952db4de4cf"
      },
      "outputs": [],
      "source": [
        "# Check which entities spaCy can recognize.\n",
        "doc = nlp(text_strings[0])\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 6,
      "metadata": {
        "id": "7f841596-f217-46d2-b64e-1952db4de4d0"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "The New York Times 0 18 ORG\n",
            "American 25 33 NORP\n",
            "daily 34 39 DATE\n",
            "New York City 59 72 GPE\n"
          ]
        }
      ],
      "source": [
        "for ent in doc.ents:\n",
        "    print(ent.text, ent.start_char, ent.end_char, ent.label_)\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 7,
      "metadata": {
        "id": "7f841596-f217-46d2-b64e-1952db4de4d1"
      },
      "outputs": [
        {
          "data": {
            "text/html": [
              "<span class=\"tex2jax_ignore\"><div class=\"entities\" style=\"line-height: 2.5; direction: ltr\">\n",
              "<mark class=\"entity\" style=\"background: #7aecec; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
              "    The New York Times\n",
              "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">ORG</span>\n",
              "</mark>\n",
              " is an \n",
              "<mark class=\"entity\" style=\"background: #c887fb; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
              "    American\n",
              "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">NORP</span>\n",
              "</mark>\n",
              " \n",
              "<mark class=\"entity\" style=\"background: #bfe1d9; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
              "    daily\n",
              "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">DATE</span>\n",
              "</mark>\n",
              " newspaper based in \n",
              "<mark class=\"entity\" style=\"background: #feca74; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
              "    New York City\n",
              "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">GPE</span>\n",
              "</mark>\n",
              " with a worldwide readership.</div></span>"
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        }
      ],
      "source": [
        "# Visualize the results.\n",
        "from spacy import displacy\n",
        "displacy.render(doc, style=\"ent\")\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 8,
      "metadata": {
        "id": "7f841596-f217-46d2-b64e-1952db4de4e0"
      },
      "outputs": [
        {
          "data": {
            "text/html": [
              "<span class=\"tex2jax_ignore\"><div class=\"entities\" style=\"line-height: 2.5; direction: ltr\">It was founded in \n",
              "<mark class=\"entity\" style=\"background: #bfe1d9; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
              "    1851\n",
              "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">DATE</span>\n",
              "</mark>\n",
              " by \n",
              "<mark class=\"entity\" style=\"background: #aa9cfc; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
              "    Henry Jarvis\n",
              "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">PERSON</span>\n",
              "</mark>\n",
              " \n",
              "<mark class=\"entity\" style=\"background: #aa9cfc; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
              "    Raymond\n",
              "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">PERSON</span>\n",
              "</mark>\n",
              " and \n",
              "<mark class=\"entity\" style=\"background: #aa9cfc; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
              "    George Jones\n",
              "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">PERSON</span>\n",
              "</mark>\n",
              ", and was initially published by \n",
              "<mark class=\"entity\" style=\"background: #7aecec; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
              "    Raymond, Jones &amp; Company\n",
              "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">ORG</span>\n",
              "</mark>\n",
              ".</div></span>"
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        }
      ],
      "source": [
        "# Visualize another example.\n",
        "displacy.render(nlp(text_strings[1]), style=\"ent\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "7f841596-f217-46d2-b64e-1952db4de4e1"
      },
      "source": [
        "## Create a model handler\n",
        "\n",
        "This section demonstrates how to create your own `ModelHandler` so that you can use `spaCy` for inference."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 9,
      "metadata": {
        "id": "7f841596-f217-46d2-b64e-1952db4de4e2"
      },
      "outputs": [
        {
          "data": {
            "application/javascript": "\n        if (typeof window.interactive_beam_jquery == 'undefined') {\n          var jqueryScript = document.createElement('script');\n          jqueryScript.src = 'https://code.jquery.com/jquery-3.4.1.slim.min.js';\n          jqueryScript.type = 'text/javascript';\n          jqueryScript.onload = function() {\n            var datatableScript = document.createElement('script');\n            datatableScript.src = 'https://cdn.datatables.net/1.10.20/js/jquery.dataTables.min.js';\n            datatableScript.type = 'text/javascript';\n            datatableScript.onload = function() {\n              window.interactive_beam_jquery = jQuery.noConflict(true);\n              window.interactive_beam_jquery(document).ready(function($){\n                \n              });\n            }\n            document.head.appendChild(datatableScript);\n          };\n          document.head.appendChild(jqueryScript);\n        } else {\n          window.interactive_beam_jquery(document).ready(function($){\n            \n          });\n        }"
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "The New York Times is an American daily newspaper based in New York City with a worldwide readership.\n",
            "It was founded in 1851 by Henry Jarvis Raymond and George Jones, and was initially published by Raymond, Jones & Company.\n"
          ]
        }
      ],
      "source": [
        "\n",
        "import apache_beam as beam\n",
        "from apache_beam.options.pipeline_options import PipelineOptions\n",
        "\n",
        "import warnings\n",
        "warnings.filterwarnings(\"ignore\")\n",
        "\n",
        "\n",
        "pipeline = beam.Pipeline()\n",
        "\n",
        "# Print the results for verification.\n",
        "with pipeline as p:\n",
        "    (p \n",
        "    | \"CreateSentences\" >> beam.Create(text_strings)\n",
        "    | beam.Map(print)\n",
        "    )\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 10,
      "metadata": {
        "id": "7f841596-f217-46d2-b64e-1952db4de4e3"
      },
      "outputs": [],
      "source": [
        "# Define `SpacyModelHandler` to load the model and perform the inference.\n",
        "\n",
        "from apache_beam.ml.inference.base import RunInference\n",
        "from apache_beam.ml.inference.base import ModelHandler\n",
        "from apache_beam.ml.inference.base import PredictionResult\n",
        "from spacy import Language\n",
        "from typing import Any\n",
        "from typing import Dict\n",
        "from typing import Iterable\n",
        "from typing import Optional\n",
        "from typing import Sequence\n",
        "\n",
        "class SpacyModelHandler(ModelHandler[str,\n",
        "                                     PredictionResult,\n",
        "                                     Language]):\n",
        "    def __init__(\n",
        "        self,\n",
        "        model_name: str = \"en_core_web_sm\",\n",
        "    ):\n",
        "        \"\"\" Implementation of the ModelHandler interface for spaCy using text as input.\n",
        "\n",
        "        Example Usage::\n",
        "\n",
        "          pcoll | RunInference(SpacyModelHandler())\n",
        "\n",
        "        Args:\n",
        "          model_name: The spaCy model name. Default is en_core_web_sm.\n",
        "        \"\"\"\n",
        "        self._model_name = model_name\n",
        "        self._env_vars = {}\n",
        "\n",
        "    def load_model(self) -> Language:\n",
        "        \"\"\"Loads and initializes a model for processing.\"\"\"\n",
        "        return spacy.load(self._model_name)\n",
        "\n",
        "    def run_inference(\n",
        "        self,\n",
        "        batch: Sequence[str],\n",
        "        model: Language,\n",
        "        inference_args: Optional[Dict[str, Any]] = None\n",
        "    ) -> Iterable[PredictionResult]:\n",
        "        \"\"\"Runs inferences on a batch of text strings.\n",
        "\n",
        "        Args:\n",
        "          batch: A sequence of examples as text strings. \n",
        "          model: A spaCy language model\n",
        "          inference_args: Any additional arguments for an inference.\n",
        "\n",
        "        Returns:\n",
        "          An Iterable of type PredictionResult.\n",
        "        \"\"\"\n",
        "        # Loop each text string, and use a tuple to store the inference results.\n",
        "        predictions = []\n",
        "        for one_text in batch:\n",
        "            doc = model(one_text)\n",
        "            predictions.append(\n",
        "                [(ent.text, ent.start_char, ent.end_char, ent.label_) for ent in doc.ents])\n",
        "        return [PredictionResult(x, y) for x, y in zip(batch, predictions)]\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 11,
      "metadata": {
        "id": "7f841596-f217-46d2-b64e-1952db4de4e4"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "The New York Times is an American daily newspaper based in New York City with a worldwide readership.\n",
            "It was founded in 1851 by Henry Jarvis Raymond and George Jones, and was initially published by Raymond, Jones & Company.\n",
            "PredictionResult(example='The New York Times is an American daily newspaper based in New York City with a worldwide readership.', inference=[('The New York Times', 0, 18, 'ORG'), ('American', 25, 33, 'NORP'), ('daily', 34, 39, 'DATE'), ('New York City', 59, 72, 'GPE')])\n",
            "PredictionResult(example='It was founded in 1851 by Henry Jarvis Raymond and George Jones, and was initially published by Raymond, Jones & Company.', inference=[('1851', 18, 22, 'DATE'), ('Henry Jarvis', 26, 38, 'PERSON'), ('Raymond', 39, 46, 'PERSON'), ('George Jones', 51, 63, 'PERSON'), ('Raymond, Jones & Company', 96, 120, 'ORG')])\n"
          ]
        }
      ],
      "source": [
        "# Verify that the inference results are correct.\n",
        "with pipeline as p:\n",
        "    (p \n",
        "    | \"CreateSentences\" >> beam.Create(text_strings)\n",
        "    | \"RunInferenceSpacy\" >> RunInference(SpacyModelHandler(\"en_core_web_sm\"))\n",
        "    | beam.Map(print)\n",
        "    )\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "7f841596-f217-46d2-b64e-1952db4de4e5"
      },
      "source": [
        "## Use `KeyedModelHandler` to handle keyed data\n",
        "\n",
        "If you have keyed data, use `KeyedModelHandler`."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 12,
      "metadata": {
        "id": "7f841596-f217-46d2-b64e-1952db4de4e6"
      },
      "outputs": [],
      "source": [
        "# You can use these text strings with keys to distinguish examples.\n",
        "text_strings_with_keys = [\n",
        "    (\"example_0\", \"The New York Times is an American daily newspaper based in New York City with a worldwide readership.\"),\n",
        "    (\"example_1\", \"It was founded in 1851 by Henry Jarvis Raymond and George Jones, and was initially published by Raymond, Jones & Company.\")\n",
        "]\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 13,
      "metadata": {
        "id": "7f841596-f217-46d2-b64e-1952db4de4e7"
      },
      "outputs": [],
      "source": [
        "from apache_beam.runners.interactive.interactive_runner import InteractiveRunner\n",
        "from apache_beam.ml.inference.base import KeyedModelHandler\n",
        "from apache_beam.dataframe.convert import to_dataframe\n",
        "\n",
        "pipeline = beam.Pipeline(InteractiveRunner())\n",
        "\n",
        "keyed_spacy_model_handler = KeyedModelHandler(SpacyModelHandler(\"en_core_web_sm\"))\n",
        "\n",
        "# Verify that the inference results are correct.\n",
        "with pipeline as p:\n",
        "    results = (p \n",
        "    | \"CreateSentences\" >> beam.Create(text_strings_with_keys)\n",
        "    | \"RunInferenceSpacy\" >> RunInference(keyed_spacy_model_handler)\n",
        "    # Generate a schema suitable for conversion to a dataframe using Map to Row objects.\n",
        "    | 'ToRows' >> beam.Map(lambda row: beam.Row(key=row[0], text=row[1][0], predictions=row[1][1]))\n",
        "    )"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 14,
      "metadata": {
        "id": "7f841596-f217-46d2-b64e-1952db4de4e8"
      },
      "outputs": [
        {
          "data": {
            "text/html": [
              "\n",
              "            <link rel=\"stylesheet\" href=\"https://stackpath.bootstrapcdn.com/bootstrap/4.4.1/css/bootstrap.min.css\" integrity=\"sha384-Vkoo8x4CGsO3+Hhxv8T/Q5PaXtkKtu6ug5TOeNV6gBiFeWPGFN9MuhOf23Q9Ifjh\" crossorigin=\"anonymous\">\n",
              "            <div id=\"progress_indicator_25aaf10d571c025a28901bea46b94c93\">\n",
              "              <div class=\"spinner-border text-info\" role=\"status\"></div>\n",
              "              <span class=\"text-info\">Processing... collect</span>\n",
              "            </div>\n",
              "            "
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "application/javascript": "\n        if (typeof window.interactive_beam_jquery == 'undefined') {\n          var jqueryScript = document.createElement('script');\n          jqueryScript.src = 'https://code.jquery.com/jquery-3.4.1.slim.min.js';\n          jqueryScript.type = 'text/javascript';\n          jqueryScript.onload = function() {\n            var datatableScript = document.createElement('script');\n            datatableScript.src = 'https://cdn.datatables.net/1.10.20/js/jquery.dataTables.min.js';\n            datatableScript.type = 'text/javascript';\n            datatableScript.onload = function() {\n              window.interactive_beam_jquery = jQuery.noConflict(true);\n              window.interactive_beam_jquery(document).ready(function($){\n                \n            $(\"#progress_indicator_25aaf10d571c025a28901bea46b94c93\").remove();\n              });\n            }\n            document.head.appendChild(datatableScript);\n          };\n          document.head.appendChild(jqueryScript);\n        } else {\n          window.interactive_beam_jquery(document).ready(function($){\n            \n            $(\"#progress_indicator_25aaf10d571c025a28901bea46b94c93\").remove();\n          });\n        }"
          },
          "metadata": {},
          "output_type": "display_data"
        }
      ],
      "source": [
        "# Convert the results to a pandas dataframe.\n",
        "import apache_beam.runners.interactive.interactive_beam as ib\n",
        "\n",
        "beam_df = to_dataframe(results)\n",
        "df = ib.collect(beam_df)\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 15,
      "metadata": {
        "id": "7f841596-f217-46d2-b64e-1952db4de4e9"
      },
      "outputs": [
        {
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>key</th>\n",
              "      <th>text</th>\n",
              "      <th>predictions</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>example_0</td>\n",
              "      <td>The New York Times is an American daily newspa...</td>\n",
              "      <td>[(The New York Times, 0, 18, ORG), (American, ...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>example_1</td>\n",
              "      <td>It was founded in 1851 by Henry Jarvis Raymond...</td>\n",
              "      <td>[(1851, 18, 22, DATE), (Henry Jarvis, 26, 38, ...</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "         key                                               text  \\\n",
              "0  example_0  The New York Times is an American daily newspa...   \n",
              "0  example_1  It was founded in 1851 by Henry Jarvis Raymond...   \n",
              "\n",
              "                                         predictions  \n",
              "0  [(The New York Times, 0, 18, ORG), (American, ...  \n",
              "0  [(1851, 18, 22, DATE), (Henry Jarvis, 26, 38, ...  "
            ]
          },
          "execution_count": 15,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "df"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "7f841596-f217-46d2-b64e-1952db4de4f0"
      },
      "outputs": [],
      "source": []
    }
  ],
  "metadata": {
    "colab": {
      "collapsed_sections": [],
      "name": "Beam RunInference",
      "provenance": [],
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3.9.13 ('venv': venv)",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.9.13"
    },
    "vscode": {
      "interpreter": {
        "hash": "aab5fceeb08468f7e142944162550e82df74df803ff2eb1987d9526d4285522f"
      }
    }
  },
  "nbformat": 4,
  "nbformat_minor": 2
}
